GStreamer ❤ Windows: A primer on the cool stuff you’ll find in the 1.20 release

7 min readDec 17, 2021

The GStreamer community keeps focusing their efforts on improving Windows support and is still adding various super fascinating features for Windows. GStreamer is about to release a new stable release (1.20) very soon, so you may want to know what’s new on the Windows front 😊

Of course there are not only new features, but also some bug fixes and enhancements since the previous 1.18 stable release series. I guarantee you will find much more stable and optimized Windows-specific elements and features in GStreamer 1.20.

What’s new?

A new desktop capture element, named d3d11screencapturesrc including a fancy GstDeviceProvider implementation to enumerate/select target monitors for capture
Direct3D11/DXVA decoder supports AV1 and MPEG2 codecs
VP9 decoding got more reliable and stable thanks to a newly written codec parser
Support for decoding interlaced H.264/AVC streams
Hardware-accelerated video deinterlacing
Video mixing with the Direct3D11 API
MediaFoundation API based hardware encoders gained the ability to receive Direct3D11 textures as an input

New Windows Desktop/Screen Capture Element

There is a new implementation of screen capture for Windows based on the Desktop Duplication API. This new implementation will likely show better performance than the other Windows screen capture elements, in case that you use sufficiently recent Windows versions (Windows 10 or 11).

You may know that there was a Desktop Duplication API based element already added in the 1.18 release, namely the dxgiscreencapsrc element. However, after the 1.18 release, I found various design issues of it and decided to re-write the element to perform better and make it nicer/cleaner. I expect that the old dxgiscreencapsrc element will be deprecated soon and the newly implemented d3d11screencapturesrc will then be the primary Windows desktop capture element.

What’s better than dxgiscreencapsrc?

Multiple capture instances: One known limitation of the Desktop Duplication API is that only single capture session for a specific physical monitor is allowed in a single process (But capturing multiple different monitors in a single process is allowed). Due to the limitation, you were not able to configure multiple dxgiscreencapsrc elements in your application to capture the same monitor.
To overcome the limitation, the new implementation was designed to hold per-monitor dedicated capture object which behaves as if server. Then, each d3d11screencapturesrc element will request frame to the capture object, like a single-server/multiple-client communication model. As a result of the new design, you can place multiple d3d11screencapturesrc elements to capture the same monitor in your application.
Performance improvement: The new element will be able to convey captured Direct3D11 texture as-is, without copying it to another system memory. This would be a major factor of performance improvement in case that the d3d11screencapturesrc is linked with Direct3D11-aware elements. Of course, d3d11screencapturesrc element can be linked with non-Direct3D11 elements as old dxgiscreencapsrc element can do.
Easier and fancy target monitor selection: The GstDeviceProvider implementation for this element will make it easier to enumerate monitors you want to capture. Alternatively, your application may have its own monitor enumeration method. No worry, you will be able to specify target monitor you want to capture very explicit way by passing HMONITOR handle to d3d11screencapturesrc as well.
Do you wonder which monitor can be captured via the new element? Then just run gst-device-monitor-1.0 with command.
See below example. You can guide gst-device-monitor-1.0 by specifying Monitor/Source class so that only monitor capture elements can be shown.

gst-device-monitor-1.0.exe Monitor/Source
Probing devices...Device found:name  : Generic PnP Monitor
        class : Source/Monitor
        caps  : video/x-raw(memory:D3D11Memory), format=BGRA, width=2560, height=1440, framerate=[ 0/1, 2147483647/1 ]
                video/x-raw, format=BGRA, width=2560, height=1440, framerate=[ 0/1, 2147483647/1 ]
        properties:
                device.api = d3d11
                device.name = "\\\\.\\DISPLAY1"
                device.path = "\\\\\?\\DISPLAY\#CMN152A\#5\&15b18d46\&0\&UID512\#\{e6f07b5f-ee97-4a90-b076-33f57bf4eaa7\}"
                device.primary = true
                device.type = internal
                device.hmonitor = 65537
                device.adapter.luid = 56049
                device.adapter.description = "AMD\ Radeon\(TM\)\ Graphics"
                desktop.coordinates.left = 0
                desktop.coordinates.top = 0
                desktop.coordinates.right = 1707
                desktop.coordinates.bottom = 960
                display.coordinates.left = 0
                display.coordinates.top = 0
                display.coordinates.right = 2560
                display.coordinates.bottom = 1440
        gst-launch-1.0 d3d11screencapturesrc monitor-handle=65537 ! ...

You may also be interested to learn that Andoni Morales Alastruey is working on a very awesome feature which is to extend the functionality of the d3d11screencapturesrc element so that you can capture a specific window instead of the entire desktop area belonging to a physical monitor. Andoni’s work should make this possible. I expect it will be one of the coolest things we will see in the future although it will likely not make it into the upcoming 1.20 release at this point.

AV1 and MPEG2 Codec Support through Direct3D11/DXVA

After we introduced a new design/infrastructure for hardware-accelerated stateless video decoding into GStreamer (See also a blog post written by Víctor Jáquez), the GStreamer community has been focusing on the new design, called GstCodecs.

As described in the Víctor’s blog post, initially I wanted to support hardware-accelerated video decoding for various codecs through Direct3D11/DXVA, and I implemented it based on Chromium’s code, including a base class for that approach to be easily extensible to the other APIs, such as VA-API, NVDEC, and V42L stateless codecs. After that, we (Nicolas Dufresne, Víctor Jáquez, He Junyan and me) worked together on the infrastructure to make it more stable and mature.

After then, He Junyan and Víctor Jáquez implemented infrastructure for AV1 and MPEG2 codecs as well. Thanks to that work, our Direct3D11/DXVA implementation was able to adopt this support seamlessly. Now you will be able to use d3d11av1dec and d3d11mpeg2dec elements (only when your GPU supports those codecs, of course).

Stabilized VP9 decoding via Newly Written Bitstream Parser

GStreamer provides functionality to parse compressed video/audio bitstreams, AVC/HEVC/VP9 codec streams for example. One major use case for the compressed video bitstream parser is stateless video decoding APIs, such as DXVA and VA-API.

Historically, when such bitstream parsers were written, GStreamer-vaapi was the main consumer of their output and therefore they focused on being VA-API friendly.

But things have changed. Nowadays the trend in GStreamer is towards stateless decoding implementations. That means our parser implementations are being more generic, and they are being more and more improved to support various use cases via the newly implemented GstCodecs infrastructure.

We also strongly recommend using newly written implementations (GstVA over Gstreamer-vaapi, nvh264sldec than nvh264dec for example) for end-users of hardware-accelerated video decoding. I expect our new GstCodecs-based decoder elements will be the major implementation and promoted over existing ones in the near future.

Back to the VP9 decoding story, when I worked on Direct3D11/DXVA based VP9 decoding, I found that the already-implemented VP9 bitstream parsing library is too VA-API specific and that there was a lack of functionalities required by other stateless video decoding APIs (DXVA and NVDEC specifically). After reading the VP9 specification and DXVA VP9 specification documents carefully, I decided to re-wrote the parsing library to make it more generic and cleaner. Now, as per my test, newly written stateless VP9 video decoders show better compliance score than before with new VP9 parser.

Interlaced H.264/AVC Decoding Support

As I mentioned above, I originally implemented the GstCodecs infrastructure based on Chromium’s code base. That was clean and lean in some aspects, but the Chromium code base didn’t support interlaced H.264/AVC decoding.

So, in order to support decoding interlaced H.264/AVC streams, I had to refactor GstCodecs’ code, and it was applied not only for Direct3D11/DXVA, but also for VA-API.

Recently I also added interlaced decoding support to the newly written NVIDIA stateless H.264/AVC decoder as well.

Hardware Accelerated Video Deinterlacing

As a follow-up task since we now have the ability to decode interlaced H.264/AVC stream, we needed to be able to process interlaced streams and deinterlace them for rendering. You will now get this functionality by using the d3d11deinterlace element.

NOTE: This new d3d11deinterlace element should work well not only with streams decoded by a Direct3D11/DXVA decoder, but all interlaced streams including those generated by software based decoders.

Direct3D11 based Video Mixing/Composing

d3d11compositor element was added to support composing/mixing multiple video streams into one, like what compositor (software implementation) or glvideomixer (GPU based, but for OpenGL) elements do. In case you use other Direct3D11-aware elements in your pipeline, I’d recommend you use the d3d11compositor element for video compositing. That will likely be the best candidate in terms of performance.

MediaFoundation Video Encoders got faster

See my blog post. You will be likely able to see better performance than before (1.18) in case that you’ve used Direct3D11/DXVA decoder and MediaFoundation video encoder pair.

Acknowledgements

The above mentioned improvements were not made only by myself, but were the result of team work with other GStreamer developers. Special thanks to Nicolas Dufresne, Víctor Jáquez and He Junyan.

What’s next on the TO-DO list?

Direct3D12: This is the future/next technology on Windows and I already implemented a proof-of-concept decoder implementation which works well with AMD GPU. Although it needs some enhancement, I expect we will end up needing Direct3D12 based video processing if your application is based on Direct3D12, or if you need more advanced features that the existing Direct3D11 based implementation can not provide.
Add more GPU vendor-specific encoder API support: Recently I found that the existing Intel MSDK plugin can perform much better on Windows and also can be more featureful both on Windows and Linux, but existing Intel MSDK plugin needed to be re-designed from my perspective. So I verified it via my proposal. Moreover, there’s also an AMD GPU specific approach which I am looking at. Why not NVIDIA? I know how GStreamer’s NVIDIA encoder can be better for Windows.
Direct3D Compute Shader (a.k.a DirectCompute): That would be very nice feature to support for generic purpose commutation tasks, like most of things CUDA or OpenCL can do (HDR tone-mapping for example).

Anything you want for Windows? please ping me, I will likely be there 😁